Diagnostic Accuracy of Ultrasound and Fine-Needle Aspiration Cytology in Thyroid Malignancy

Introduction: Thyroid nodule incidence is increasing due to the widespread application of ultrasonography. Fine-needle aspiration cytology is widely applied for the detection of malignancies. The aim of this study was to evaluate the predictive value of ultrasonography in thyroid cancer. Methods: This retrospective study included patients that underwent total thyroidectomy for benign thyroid disease or well-differentiated thyroid carcinoma from January 2017 to December 2022. The study population was divided into groups: the well-differentiated thyroid cancer group and the control group with benign histopathological reports. Results: In total, 192 patients were enrolled in our study; 159 patients were included in the well-differentiated thyroid cancer group and 33 patients in the control group. Statistical analysis demonstrated that ultrasonographic findings such as microcalcifications (90.4%), hypoechogenicity (89.3%), irregular margins (92.2%) and taller-than-wide shape (90.5%) were correlated to malignancy (p < 0.001). Uni- and multivariate analysis revealed that both US score (OR: 2.177; p < 0.001) and Bethesda System (OR: 1.875; p = 0.002) could predict malignancies. In terms of diagnostic accuracy, the US score displayed higher sensitivity (64.2% vs. 33.3%) and better negative predictive value (34.5% vs. 24.4%) than the Bethesda score, while both scoring systems displayed comparable specificities (90.9% vs. 100%) and positive predictive values (97.1% vs. 100%). Discussion: The malignant potential of thyroid nodules is a crucial subject, leading the decision for surgery. Ultrasonography and fine-needle aspiration cytology are pivotal examinations in the diagnostic process, with ultrasonography demonstrating better negative predictive value.


Introduction
The incidence of thyroid nodules (TNs) has been increasing over the last years due to the wide use of ultrasonography (US) and other imaging tests.Reportedly, TNs can be found in 2-6% of the population with palpation, in 19-35% on ultrasound and in 8-65% in autopsy series [1].Since the incidence of thyroid malignancy has been reported to 5.4% for men and 6.5% for women, only a small fraction of those nodules prove to be malignant [2].Fine Needle Aspiration Cytology (FNAc) is a widely accepted method for the evaluation of TNs and the detection of malignancy with reported accuracy rates varying and exceeding 90% in some reports [3].FNAc has indisputably contributed to the decrease in the number of unnecessary thyroid surgeries and to the increase of the preoperative diagnosis of malignant thyroid lesions.The 2023 Bethesda System for Reporting Thyroid Cytopathology recommends six reporting categories: (i) nondiagnostic; (ii) benign; (iii) atypia of undetermined significance (AUS); (iv) follicular neoplasm; (v) suspicious for malignancy (SFM); and (vi) malignant [4].Factors that can influence FNAc Medicina 2024, 60, 722 2 of 10 results are a possibly inaccessible position of the nodule, operator experience, nodule size and composition, as well as experience in cytology interpretation.
The aim of this study was to evaluate the predictive value of ultrasonographic features for malignant TNs and to assess the diagnostic performance of these features in thyroid cancer patients.

Study Population
Approval for this retrospective study was obtained from the institutional review board of our hospital (Scientific Council of Theageneio Cancer Hospital, Thessaloniki, Greece, 2661/22 February 2024), and informed consent was obtained from all patients.From January 2017 to December 2022, all patients aged >18 years old who underwent total thyroidectomy in our institution with a histopathology report of a benign thyroid gland or well-differentiated thyroid carcinoma were eligible for the study.Exclusion criteria were insufficient data on the preoperative evaluation of the patients, previous thyroid surgery and a pathology or cytology report of other types of malignancies apart from well-differentiated thyroid cancer.
The studied population was divided into two groups, the malignant group (WDTCgroup) which included patients with a histopathological diagnosis of well-differentiated thyroid cancer following thyroidectomy, and the control group (NC-group) consisting of patients with a benign histopathology report.All patients included in this study were preoperatively submitted to a head and neck ultrasound and an ultrasound-guided FNAc (US-FNAc) of the suspicious nodules.

Ultrasonography and US-FNAc
A complete thyroid and neck ultrasound was acquired preoperatively from all patients.The examination was conducted by experienced radiologists (>than 5 years of experience).A US scoring system was implemented in accordance with the 2015 ATA Guidelines, associating thyroid nodules with the risk of malignancy based on their sonographic pattern.The thyroid nodules were assessed regarding the following ultrasonographic features: size, microcalcifications, increased vascularity, hypoechogenicity, taller-than-wide shape, irregular margins, extrathyroidal extension and solid composition.Each of the aforementioned Medicina 2024, 60, 722 3 of 10 features was appointed one point in the scoring system, reported as US score.Regarding the size, one point was given when the size was more than 10 mm.
A non-aspiration technique using a 23-gauge needle attached to a 5 mL syringe was performed.The samples were evaluated by experienced cytopathologists that were blinded to the US findings.The results of the FNAc were classified into the following six categories according to the Bethesda System for Reporting Thyroid Cytology: (1) nondiagnostic or unsatisfactory (Bethesda System I), (2) benign (Bethesda System II), (3) atypia or follicular lesion of undetermined significance (AUS/FLUS) (Bethesda System III), (4) follicular neoplasm or suspicious for a follicular neoplasm (Bethesda System IV), (5) suspicious for malignancy (Bethesda System V) and ( 6) malignant (Bethesda System VI).In cases of nodules with both cystic and solid components, an FNAc of the solid part of the nodule was performed.When a multinodular goiter was present, both the largest nodule and the nodule with the most suspicious characteristics were aspirated.

Data and Statistical Analysis
Data were analyzed in the Statistical Package for the Social Sciences 25.0 (SPSS Inc., Chicago, IL, USA) and R Software (version 3.6.2).Relationships with a two-sided pvalue of less than 0.05 were considered statistically significant.The reference standard for malignancy was the histopathology report.In cases in which both the largest nodule and the nodule with the most suspicious characteristics were aspirated and assessed, the data were analyzed cumulatively, like both nodules being independent.Continuous variables were demonstrated as means with standard deviation (SD) or as medians with interquartile ranges (IQRs), depending on normality having been assumed or not, respectively, while categorical variables were presented as frequencies with percentages (%).The Chi-square test (X 2 ) and Fisher's exact test were applied to investigate the malignancy rate in categorical variables for US features.Independent samples t-tests and the non-parametric test of Mann-Whitney were used to evaluate the relationship between continuous variables and malignancies.Univariate and multivariate logistic regression analysis was performed to predict the probability of cancer for both the US score and the Bethesda System.Gender and age at the time of surgery were both included in the final model.After the univariate logistic analysis of every possible factor had been performed, a multivariate logistic analysis was conducted and the final model was built.During univariate regression, a factor was included in the multivariate logistic regression model when it met a statistical significance of a p-value less than 0.20.The final model was built using a stepwise backward elimination method with a significance level of 0.05.
In addition, a receiver operating characteristic (ROC) curve was generated to calculate the optimal cut-off point of the US score for thyroid malignancy which was chosen based on the accompanying Youden's index; sensitivity and specificity were also measured.After the optimal cut-off points for NLR and PLR had been calculated, then the sample size was divided into two groups based on them and the mortality incidence was reassessed [9,10].

Basic Characteristics
One hundred and ninety-two patients were included in the study.The WDTC group included 159 patients and the NC group included 33 patients.No statistically significant differences were observed in the demographic characteristics between the two groups.Of the total 192 patients included in this study, 162 were female (84.4%) and 30 were male (15.6%).The mean age at the time of the surgery was 52.2 (SD: 13.3) years.The majority of the patients in both groups (n = 154, 80.6%) presented with a multinodular goiter (Table 1).

US Features
In total, 246 nodules were evaluated.Nodule size ranged from 8.0 to 58.0 mm, with a median of 21.0 mm (IQR: 17.3 mm).Malignant nodules tended to be slightly smaller in size than benign ones (p = 0.005).Microcalcifications were present in 125 (50.8%) of the examined nodules, while the majority of them (113, 90.4%) were proven to be malignant (p < 0.001).Hypoechogenicity also showed a high correlation with malignancy (p ≤ 0.001), as 89.3% of the hypoechoic nodules represented well-differentiated thyroid carcinomas (134 out of 150 hypoechoic nodules in total), while 61.0% of the malignant nodules were hypoechoic.Hyperechogenicity was rarely observed within both groups.Both irregular margins and taller-than-wide shapes were much more common in malignant nodules (92.2% vs. 7.8%, and 90.5% vs. 9.5%, respectively) and, thus, were highly associated with malignancy (p < 0.0001).In addition, almost 62% of the malignant nodules had a tallerthan-wide shape (124 out of 201).No correlation was proven between malignancy and the vascularity pattern of the nodules in this study.All of the nodules that demonstrated extra-thyroidal extension on US evaluation proved to be thyroid carcinomas, but it was a rarely noted feature (4.9% of the nodules examined) (Table 2).

Multivariate Regression Analysis and Diagnostic Performances
Multivariate logistic regression analysis was performed to determine the malignancy prediction and the odds ratios for the US score (and its components separately) and the Bethesda System, respectively.In the final multivariate model, gender and age were also included for both models.Independent risk factors for malignancy were independently both US score (OR: 2.177; p < 0.0001) and Bethesda System (OR: 1.875; p = 0.002) (Tables 3 and 4).It is interesting to see that the characteristics included in the US score did not contribute equally.The risk score for irregular margins was more elevated than that of "taller-thanwide" or hypoechoic character of the nodule.On the other hand, the nodule size had risk score values near zero.The discriminatory performance of US score for predicting thyroid nodule malignancy and the respective ROC curve is presented in Figure 1.More specifically, the US score exhibited a significant strong discriminatory performance (AUC = 0.784, CI 95%: 0.693-0.875).Regarding the optimal cut-off point of the US score, this was 3.5, with sensitivity of 79.9% and specificity of 66.7%.
The discriminatory performance of US score for predicting thyroid nodule malignancy and the respective ROC curve is presented in Figure 1.More specifically, the US score exhibited a significant strong discriminatory performance (AUC = 0.784, CI 95%: 0.693-0.875).Regarding the optimal cut-off point of the US score, this was 3.5, with sensitivity of 79.9% and specificity of 66.7%.

Diagnostic Performances
Diagnostic tests were performed to evaluate the sensitivity, specificity, positive predictive value and negative predictive value of both the US scoring system and the Bethesda System in this study.
The two methods were compared in regard to their diagnostic accuracy, sensitivity, specificity and positive and negative predictive value.While the Bethesda System demonstrated specificity and PPV of 100%, its sensitivity was proven to be as low as 33%.On the other hand, the US scoring system was found to have a lower specificity rate and PPV (90.9% and 97.1%, respectively), but with a much higher sensitivity value (64.2%) and a higher NPV (34.5% vs. 24.4%)with a statistical significance of p < 0.001 (Table 5).The diagnostic accuracy was 68.75% for the US scoring system and 43.75% for the Bethesda System.A ROC curve analysis was performed for the evaluation of both systems, and the AUC was calculated.The reference standard for malignancy was the histopathology report.When comparing the two AUCs with a Z score test, a statistically significant difference between the two methods was observed.The AUC for the US score ROC curve analysis was 88.0%, a significantly higher value compared to the AUC of the Bethesda System ROC curve analysis with a value of 68.3% (p < 0.001) (Figure 1).

Diagnostic Performances
Diagnostic tests were performed to evaluate the sensitivity, specificity, positive predictive value and negative predictive value of both the US scoring system and the Bethesda System in this study.
The two methods were compared in regard to their diagnostic accuracy, sensitivity, specificity and positive and negative predictive value.While the Bethesda System demonstrated specificity and PPV of 100%, its sensitivity was proven to be as low as 33%.On the other hand, the US scoring system was found to have a lower specificity rate and PPV (90.9% and 97.1%, respectively), but with a much higher sensitivity value (64.2%) and a higher NPV (34.5% vs. 24.4%)with a statistical significance of p < 0.001 (Table 5).The diagnostic accuracy was 68.75% for the US scoring system and 43.75% for the Bethesda System.A ROC curve analysis was performed for the evaluation of both systems, and the AUC was calculated.The reference standard for malignancy was the histopathology report.When comparing the two AUCs with a Z score test, a statistically significant difference between the two methods was observed.The AUC for the US score ROC curve analysis was 88.0%, a significantly higher value compared to the AUC of the Bethesda System ROC curve analysis with a value of 68.3% (p < 0.001) (Figure 1).

Discussion
The diagnostic evaluation of thyroid nodules is a difficult process.It involves a careful history and clinical examination, followed by a thyroid ultrasound and hormonal tests to assess the thyroid function and the presence of autoantibodies.The clinical importance of thyroid nodules hinges on the need to diagnose thyroid cancer, which occurs in 7-15% of cases based on age, sex, radiation exposure history, family history, smoking habit, obesity and other factors [5,11].Nowadays, about 40% of the WDTCs diagnosed are less than 1 cm.This tumor shift may be due to the increasing use of ultrasonography or other imaging methods and early diagnosis and treatment [12,13].In a large retrospective study by Chen et al., the higher incidence of thyroid cancer in thyroid nodules screened with ultrasound rather than palpation was established, and thus, two-thirds of the thyroid nodules believed to be normal were microcarcinomas [13].The optimization of long-term health outcomes and education about potential prognoses for individuals with thyroid neoplasms is critically important.
In the international literature, there are some systematic reviews and meta-analyses that tried to analyze ultrasound and FNA diagnostic accuracy in thyroid malignancy [14,15].Ospina et al. concluded that the available evidence only warrants limited confidence on the diagnostic accuracy of FNA due to risk of bias, imprecision and inconsistency among studies, but the likelihood for purely benign and purely malignant potential was found to be high [15].Regarding ultrasound, Remonti et al. reported that solitary ultrasonographic findings alone could not predict malignancy, but the combination of microcalcifications, a taller-than-wide shape, irregular margins or the absence of elasticity could offer reliable information in terms of malignant potential [14].The absence of elasticity had the best ultrasonographic performance for malignant results [14].The study by Ito et al. demonstrated a positive predictive value of ultrasound in 97.2%, while Nie et al. reported ultrasound as being a highly accurate examination for thyroid nodule nature discrimination with a specificity of 33.88% and sensitivity of 92.53% [16,17].In terms of cytology, FNA was found to have positive predictive value of 100% and negative predictive value of 43.75% in a randomized cross-sectional study [18].Thus, our study tried to interpret the diagnostic accuracy of ultrasound and FNA in thyroid malignancy diagnosis retrospectively.Our results demonstrated ultrasonographic findings such as microcalcifications, hypoechogenicity, irregular margins and taller-than-wide shapes to be correlated to malignancies with high statistical significance.Furthermore, our study is in concordance with the international available data in terms of positive and negative predictive values of FNA and ultrasound scores, while we concluded that ultrasound specificity was higher and sensitivity was lower than the values proposed in other studies.
Several studies have reported the utility of ultrasonography alone for distinguishing benign from malignant nodules [6,19].In addition, it is cost-effective, widely available and not invasive.Therefore, ultrasonography has been adopted as the first useful step in determining the location and nature of thyroid nodules [16,[20][21][22][23][24].Alshoabi et al. reported that B-mode ultrasonography alone could differentiate benign nodules with excellent diagnostic accuracy [20].In this context, we tried to separately examine the sonographic characteristics of thyroid nodules.Their characteristics together with the total US score were analyzed.Moreover, we analyzed the Bethesda score and compared the diagnostic accuracy of the scores.From the uni-and multivariate analysis of both the Bethesda and US scoring systems, we can safely conclude that both could effectively predict the existence of a TN containing a malignancy.Interestingly, the US score displayed higher sensitivity and better negative predictive value than the Bethesda score, while both scoring systems displayed comparable specificities and positive predictive values.Not surprisingly, in the present study, the diagnostic accuracy of the US score was superior to Bethesda score.
When looking closely at the US score and its' various components, we observed that microcalcification, hypoechogenicity, cystic element, "taller-than-wide" and irregular margins are parameters that are separately strongly correlated with the malignant potential of the TN.This is in consistency with other studies in the literature.Nabahati et al. and Ram et al. proposed the same parameters as indicators of malignancy [19,21].This meanson a clinical basis-that a patient presenting with one of those characteristics should be considered as having a malignancy and treated adequately.
Moreover, the uni-and multivariate analysis of the logistic regression model of US scoring showed that regardless of the other characteristics, irregular margin, hypoechoic nodule and "taller-than-wide" are strong predictors of malignancy.On the contrary, the nodule size seems not to have a role in the malignant potential of a TN, which is in agreement with Rahimi et al.'s study, concluding that nodule size should not be a criterion for malignancy, but irregular edges, being solid, hypoechogenicity and being a single nodule are major components of malignancy [24].In the analysis of the Bethesda System, nodule size seems to have a pivotal predictive role in malignancy.Thyroid cytology faces pitfalls in false positive and negative results, such as the misinterpretation of cystic degeneration and squamous cells in partially cystic lesions or the underestimation of architectural and cellular features in follicular patterns, with a wide range of sensitivity (65-99%) and specificity (72-100%) [25].Zhu et al. resulted that sampling error (86.7%) was the most common cause of false negative diagnoses in FNA, mainly due to nodule size, while interpretation error (80.9%) was the most common cause of false positive diagnoses, affected by overlapping cytological features in adenomatous hyperplasia, thyroiditis and cystic lesions [26].In our study, the Bethesda System demonstrated specificity and PPV of 100%, while its sensitivity was proven to be as low as 33% and its diagnostic accuracy was 43.75%.
This study has critical clinical implications.First of all, the results of our retrospective study reinforce the fact that ultrasound and FNA biopsy could lead the decision behind the surgical management of suspicious thyroid nodules.Ultrasound has high specificity and positive predictive value for malignant potential, while we found that ultrasound sensitivity is less than already proposed values.FNA biopsy also features high specificity and positive predictive value for thyroid malignancy.Compared to one another, especially in terms of sensitivity and negative predictive values which are lacking in both examinations, ultrasound was found to score higher than FNA biopsy.This is a pivotal new finding that could provide advanced perspectives in the management of suspicious thyroid nodules.In terms of indetermined results between ultrasound and FNA biopsy, we propose that ultrasound findings are more reliable than FNA biopsy of suspicious thyroid nodules, based on the findings of a fully experienced radiologist in the field of the thyroid gland.

Limitations
A major limitation of our study is the retrospective form of collection of data.We had strict inclusion criteria to lower the risk of selection bias.

Conclusions
The diagnostic evaluation of thyroid nodules is a difficult process.Sonographic findings dictate fine-needle aspiration cytology and combined approaches lead the diagnostic process of thyroid malignancy.Our study demonstrates that US score has higher sensitivity and better negative predictive value than Bethesda score, while both scoring systems displayed comparable specificities and positive predictive values.Ultrasound seems to be more reliable in predicting thyroid malignancy, with microcalcifications, hypoechogenicity, irregular margins, and taller-than-wide shapes being major factors of malignant potential.

Figure 1 .
Figure 1.Receiver operating characteristic (ROC) curves of the Bethesda System and US score.

Figure 1 .
Figure 1.Receiver operating characteristic (ROC) curves of the Bethesda System and US score.

Table 1 .
Basic characteristics of the study population.

Table 2 .
Ultrasound characteristics of the thyroid nodules.

Table 3 .
Logistic models for the US score (WDTC group vs. NC group) prediction.
Abbreviations: β, coefficient of the explanatory variable; SE, standard error; OR, odds ratio; CI: confidence interval; RS: risk score.

Table 4 .
Logistic models for the Bethesda (WDTC group vs. NC group) prediction.
Abbreviations: β, coefficient of the explanatory variable; SE, standard error; OR, odds ratio; CI: confidence interval; RS: risk score.

Table 5 .
Diagnostic accuracy of US score and Bethesda System in the diagnosis of cancer.